-
Notifications
You must be signed in to change notification settings - Fork 35
fix(node): gracefully clean up iota-node validator components #6831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The latest updates on your projects. Learn more about Vercel for Git ↗︎ 4 Skipped Deployments
|
jkrvivian
reviewed
May 9, 2025
jkrvivian
approved these changes
May 9, 2025
3d14434
to
f7d54b1
Compare
piotrm50
approved these changes
May 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
nmrshll
reviewed
May 22, 2025
nmrshll
reviewed
May 22, 2025
muXxer
approved these changes
May 22, 2025
bingyanglin
approved these changes
May 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
872ae68
to
f7141f3
Compare
ea7d282
to
220f7f2
Compare
miker83z
approved these changes
May 27, 2025
semenov-vladyslav
added a commit
that referenced
this pull request
May 27, 2025
# Description of change After a validator node leaves the committee, a few resources in validator components are leaked including metrics and grpc server. If the node will try to re-join the committee it will crash due to resources already being in use. This PR: - changes the way validator metrics are registered: a dedicated registry is used instead of default one; - adds a graceful shutdown of validator grpc server and metrics registry clean up during reconfiguration when a validator leaves the committee; - adds a reproducer test. ## Notes There are a few ways to handle metrics issue: - ignore `AlreadyReg` error while trying to re-register metrics with the default registry; requires special unwrap handling for each metric; - unregister individual metrics with the default registry; the existing components do not provide such functionality; requires unregistering all individual metrics; - use a dedicated registry and remove it entirely from the registry service once node is not validator anymore; this seems to be the easiest and cleanest way. :warning: There may be other undetected resources leaking that do not cause error if validator components are re-created. ## Links to any relevant issues Fixes #6277. ## Type of change - Bug fix (a non-breaking change which fixes an issue) ## How the change has been tested ```sh scripts/simtest/cargo-simtest simtest --package iota-e2e-tests --test "reconfiguration_tests" --profile ci -- test_reconfig_with_same_validator ``` ## Change checklist - [x] I have followed the contribution guidelines for this project - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have checked that new and existing unit tests pass locally with my changes ### Release Notes - [ ] Protocol: - [x] Nodes (Validators and Full nodes): Fixed a bug causing iota-node to crash after re-joining the consensus committee as validator. - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK: - [ ] REST API: --------- Co-authored-by: muXxer <[email protected]>
alexsporn
pushed a commit
that referenced
this pull request
May 28, 2025
…onents (#6831) (#7093) # Description of change After a validator node leaves the committee, a few resources in validator components are leaked including metrics and grpc server. If the node will try to re-join the committee it will crash due to resources already being in use. This PR: - changes the way validator metrics are registered: a dedicated registry is used instead of default one; - adds a graceful shutdown of validator grpc server and metrics registry clean up during reconfiguration when a validator leaves the committee; - adds a reproducer test. ## Notes There are a few ways to handle metrics issue: - ignore `AlreadyReg` error while trying to re-register metrics with the default registry; requires special unwrap handling for each metric; - unregister individual metrics with the default registry; the existing components do not provide such functionality; requires unregistering all individual metrics; - use a dedicated registry and remove it entirely from the registry service once node is not validator anymore; this seems to be the easiest and cleanest way. :warning: There may be other undetected resources leaking that do not cause error if validator components are re-created. ## Links to any relevant issues Fixes #6277. ## Type of change - Bug fix (a non-breaking change which fixes an issue) ## How the change has been tested ```sh scripts/simtest/cargo-simtest simtest --package iota-e2e-tests --test "reconfiguration_tests" --profile ci -- test_reconfig_with_same_validator ``` ## Change checklist - [x] I have followed the contribution guidelines for this project - [x] I have performed a self-review of my own code - [x] I have commented my code, particularly in hard-to-understand areas - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have checked that new and existing unit tests pass locally with my changes ### Release Notes - [ ] Protocol: - [x] Nodes (Validators and Full nodes): Fixed a bug causing iota-node to crash after re-joining the consensus committee as validator. - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK: - [ ] REST API: Co-authored-by: muXxer <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of change
After a validator node leaves the committee, a few resources in validator components are leaked including metrics and grpc server. If the node will try to re-join the committee it will crash due to resources already being in use.
This PR:
Notes
There are a few ways to handle metrics issue:
AlreadyReg
error while trying to re-register metrics with the default registry; requires special unwrap handling for each metric;Links to any relevant issues
Fixes #6277.
Type of change
How the change has been tested
scripts/simtest/cargo-simtest simtest --package iota-e2e-tests --test "reconfiguration_tests" --profile ci -- test_reconfig_with_same_validator
Change checklist
Release Notes